China-US grain trade shapes the spatial genetic pattern of common ragweed in East China cities

Common ragweed is an invasive alien species causing severe allergies in urban residents. Understanding its urban invasion pathways is crucial for effective control. However, knowledge is limited, with most studies focusing on agricultural and natural areas, and occurrence record-based studies exhibiting uncertainties. We address this gap through a study in East China cities, combining population genetics and occurrence records. Leaf samples from 37 urban common ragweed populations across 15 cities are collected. Genomic and chloroplast DNA extraction facilitate analysis of spatial genetic patterns and gene flows. Additionally, international grain trade data is examined to trace invasion sources. Results indicate spatial genetic patterns impacted by multiple introductions over time. We infer the modern grain trade between the United States and China as the primary invasion pathway. Also, cities act as transportation hubs and ports of grain importation might disperse common ragweed to urban areas. Invasive species control should account for cities as potential landing and spread hubs of common ragweed.


DNA extraction, nuclear microsatellite and cpDNA amplifications, and fragment sizing
The CTAB method was used to extract genomic DNA from silica-dried leaf tissue from 37 populations in East China cities (Lee, Milgroom, & Taylor, 1988;Zhang, Zhang, Liu, Wen, & Wang, 2010).In a brief, the cell walls of leaves were broken down in the presence of liquid nitrogen.The CTAB extraction buffer was then added, and after incubation at 65 °C, purification with phenol:chloroform (1: 1), precipitation with isopropanol and rinsed with 70% ethanol were conducted.Finally, the DNA was dissolved in 40μl 1 XTE Solution (Sterilized).
Genomic DNA of 37 populations in East China cities, four French and Italian populations, and one North American population were used to amplify and fragment sizing nuclear microsatellites.Microsatellite primer sequences were Gen-Bank sequences FJ595149 (Ambart04), FJ5950 (Ambart06), FJ5952 (Ambart17), FJ5953 (Ambart18), FJ595155 (Ambart24) and FJ595156 (Ambart27) (Supplementary Table 4), in-house ROX-labeled were used as size standards (DeWoody et al., 2004) and implemented Biosystems Hi-Dye○ R formamide.The PCR profile was as follows: initial denaturation for 5 min at 94 °C; there were 35 cycles of 30 s denaturation at 94 °C; renaturation at 54 ℃ for 35 s; and extension at 72 ℃ for 40 s.Finally, it was extended at 72 ℃ for 3 min.Genomic DNAs of 37 populations in East China cities were used to sequence two spacer regions (atpH-atpF and psbK-psbI) of chloroplast DNA (cpDNA).The PCR profile was as follows: initial denaturation for 5 min at 94 °C; 35 cycles of 30 s denaturation at 94 °C; renaturation at 55 ℃ for 35 s; and extension at 72 ℃ for 40 s.Finally, it was extended at 72 ℃ for 3 min.In total, we have 467 chloroplast DNA sequence chromatograms.Due to high AT content and poly A structure, 29 sequence chromatograms were excluded from analysis.438 sequence chromatograms were readable and used for alignment.Chloroplast DNA chromatograms of sequenced samples from populations in East China cities were aligned using MAFFT (Katoh & Toh, 2010), then the alignment was manually checked and edited using Geneious version 11.0.4 (Biomatters Ltd., Auckland, New Zealand).In the alignment, the 5' UTR and 3' UTR contain missing and ambiguous sites, so both regions were excluded in the following analyses.All sequences were concatenated and aligned into a super-alignment in Geneious version 11.0.4.(Biomatters Ltd., Auckland, New Zealand).
Genomic DNA extraction, amplification, and sequencing of nuclear microsatellites and cpDNA were completed at Bio-ulab company (Beijing, China).

Genetic pattern of common ragweed in East China cities
Tests of genotypic linkage equilibrium were performed with 10,000 Markov chains, 100 batches, and 50,000 iterations per batch implemented in GENEPOP 4.7.0 (Rousset, 2008) for each pair of loci.The EM algorithm implemented in GENEPOP 4.7.0 (Rousset, 2008) estimated the frequency of the null allele.Genetic diversity indices, including the number of different alleles (NA), expected heterozygosity (HE), and inbreeding coefficients (FIS), were estimated in GenAlEx 6. 51 and GENEPOP 4.7.0 (Peakall & Smouse, 2012;Rousset, 2008) to represent diversity at nuclear microsatellite loci, respectively.Deviations from Hardy-Weinberg equilibrium were assessed across loci and sampling localities by FIS (Weir & Cockerham, 1984), and significance was tested by 10000 Markov chains, 20 batches, and 5000 iterations per batch implemented in GENEPOP 4.7.0 (Rousset, 2008).The significance of the p-value was assessed by the Bonferroni correction implemented in the basic R version 4.0.2(R Core Team, 2018).The potential bottleneck was detected by BOTTLENECK version 1.2.0.2 (J.M. Cornuet & Luikart, 1996).The Two-phased Mutation Model (TPM) with variance of the geometric distribution for TPM = 0.36 and proportion of SMM in the TPM = 0.000 was used in BOTTLENECK version 1.2.0.2 (J.M. Cornuet & Luikart, 1996), for it was considered to be the most suitable model for microsatellite data (J.-M.Cornuet & Luikart, 2018).The mode-shift indicator of TPM is used to evaluate the population's bottleneck status.Pairwise population FST with a 95 % confidence interval were calculated in the diveRsity package (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013) with 1,000 permutations.
The number of haplotypes (h), haplotype diversity (HD), and nucleotide diversity (π) were computed with DnaSP 6 (Rozas et al., 2017).The geographical distribution of cpDNA haplotype was visualized using ArcGIS Desktop 10.2 (ESRI, Redlands, CA, USA) to explore the relation of the populations in East China cities.A median-joining network of haplotypes was generated by NETWORK 10.1 (Bandelt, Forster, & Röhl, 1999) without outgroup with MP options (Bandelt et al., 1999).A rooted neighbor-joining tree with Ambrosia trifida L. as an outgroup was constructed using Geneious version 11.0.4 (Biomatters Ltd., Auckland, New Zealand) with a Tamura-Nei genetic distance model using 1000 replicates.STRUCTURE version 2.3.3 (Pritchard, Stephens, & Donnelly, 2000) and GENELAND version 4.0.3(Guillot, Estoup, Mortier, & Cosson, 2005) with the Bayesian clustering algorithms were used to implement genetic clustering analyses of the populations in East China cities.They classified individuals into an a priori unknown number of genetic groups using a probabilistic approach based on their multilocus genotypes at the six nuclear microsatellites and the cpDNA haplotypes.STRUCTURE only used prior information about sampling populations, while GENELAND added the longitude and latitude of the sampled populations and implemented spatially explicit algorithms to optimize inference.
The spatial model assuming uncorrelated allele frequencies was implemented in GENELAND version 4.0.3(Guillot et al., 2005).The maximum number of nuclei in the Poisson-Voronoi tessellation was set to 500.The inference of population structuring was based on 10 independent runs, each allowing the number of clusters to vary between 1 and 10.Each run consisted of 500 000 Markov chain Monte Carlo (MCMC) iterations with a thinning interval of 100.The data were post-processed with a burn-in of 1000 iterations and 500 by 700 cells in the spatial domain.The most likely number of clusters K was chosen by GENELAND according to the density of the posterior probability chain.
The analyses were carried out in STRUCTURE version 2.3.3 (Pritchard et al., 2000) using the admixture model, which assumed correlated allele frequencies but no location priors.Twenty independent runs were performed with 200,000 initial steps of burn-in followed by 1,000,000 iterations for each of K = 1 through 10.The most likely number of clusters K was chosen because it corresponded to the peak of Delta K according to the method of Evanno et al. (2005), which was implemented in the STRUCTURE HARVESTER web version 0.6.94(Earl & vonHoldt, 2012). CLUMPP v1.1.2 (Jakobsson & Rosenberg, 2007) was used to calculate the mean of the permuted matrices.Results were illustrated with Origin 2018 (Northampton, MA, USA).BayesAss v3.0.4 (Wilson & Rannala, 2003) was implemented to calculate recent gene flow by calculating the migration rate among 37 populations in East China cities at the city level based on the nuclear microsatellites marker, due to limitation of computing power.We used four chains of 500,000,000 steps each with a burn-in of 100,000,000 steps with different seeds.The delta values for a, m, and f were 0.8, 0.5, and 0.9, respectively.Gelman-Rubin diagnostics in package coda (Plummer, Best, Cowles, & Vines, 2006) were used for convergence diagnosis in MCMC chains.In our analysis, every variable of MCMC chains had Gelman-Rubin potential scale reduction factors less than 1.1 and an effective sample size larger than 200.
To explore whether the city has a high dispersal potential for common ragweed in East China, we built a genetic graph.For the genetic graph, we constructed a directed weighted population graph of 15 metapopulations.The migration rates calculated by BayesAss v3.0.4 (Wilson & Rannala, 2003) were used to represent connectivity among sample cities.In this graph, the threshold m = 0.01 was applied to reduce the edge, producing a minimal directed graph (Dale & Fortin, 2010).The R packages POPGRAPH (Dyer, 2009) and IGRAPH (Csardi & Nepusz, 2006) were used to construct the graph and calculate graph centrality measures.The Strength centrality measure was calculated in this study.In our genetic graph, the Strength is an index of the magnitude of genetic exchange with all nodes connected to a given node.

Compare the similarity of common ragweed populations in East China cities to those in other countries
To identify the source populations of common ragweed populations in East China cities, each plant was assigned to French and Italian, modern North American, and historical North American populations.The North American samples were grouped by the genetic clusters identified by Martin et al. (2014).Bayesian criteria for likelihood estimation (Rannala & Mountain, 1997) were used because they performed better in the previous common ragweed assignment test (Genton, Shykoff, & Giraud, 2005).These tests were performed in GENECLASS2 (Piry et al., 2004) with an assignment threshold of 0.05 and the simulation algorithm of Paetkau et al. ( 2004) with 10000 simulated individuals from Monte-Carlo resampling based on the nuclear microsatellites markers.
Genetic diversity indices (including NA, HE, and FIS) of French and Italian, and modern North American populations were estimated in GenAlEx 6.51 (Peakall & Smouse, 2012) and GENEPOP 4.7.0 (Rousset, 2008), respectively.Deviations from Hardy-Weinberg equilibrium were assessed across loci and sampling localities by FIS (Weir & Cockerham, 1984), and significance was tested by 10000 Markov chains, 20 batches, and 5000 iterations per batch implemented in GENEPOP 4.7.0 (Rousset, 2008).An analysis of variance (ANOVA) implemented in the stats (R Core Team, 2018) package was used to test differences among the geographic ranges in HE and FIS.

Supplementary Table 1.
Location and grouping information of the urban populations in East China cities. Cluster, the clusters identified by GENELAND.

Figure 2 .
Phylogenetic relationship of cpDNA haplotype.a. Rooted neighbor-Joining tree based on phylogenetic analyses of cpDNA (length = 802 bp).Bootstrap values are shown at nodes.b.Evolutionary paths of cpDNA haplotype in sequenced plants, circle size represented the number of plants of specific haplotype, color of the circle indicated the haplotype as in Figure 1, and branch length indicated the mutation distance.a. timeline of posterior density of the number of clusters.b. a histogram of the density of the number of clusters.c.GENELAND, k = 7.Using the spatial model with uncorrelated allele frequencies, map of estimated posterior probability of common ragweed population membership (by posterior mode).Abbreviations are names of sampled cities and populations: Mudanjiang (MDJ), Changchun (CC), Shenyang (SY), Fushun (FS), Qinhuangdao (QHD), Beijing (BJ), Qingdao (QD), Shanghai (SH), Nanjing (NJ), Wuhan (WH), Changsha (CS), Fuzhou (FZ), Guangzhou (GZ), Chongqing (CQ), Guiyang (GY).

Table 2 .
Gene flow among 15 sampling cities.The gene flow migrated from column to row, gradation of color indicated the incremental migration rate.